Analytical Studies of Strategies for Utilization of Cache Memory in Computers 
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We analyze quantitatively several strategies for better utilization of the cache or the fast access 
memory in computers. We define a performance factor a that denotes the fraction of the cache 
area utilized when the main memory is accessed at random. We calculate a exactly for different 
competing strategies, including the hash-rehash and the skewed-associative strategies which were 
earlier analyzed via simulations. 
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The memory of a computer is organized in pages. A 
single page consists of several words. As the computer 
performs its computations, it reads from and writes into 
memory. When a word of main memory is accessed, the 
entire page containing the word is fetched and stored in 
a more accessible part of the computer's hardware called 
the cache, which admits fast access. Subsequent accesses 
are likely to be for words in this page and hence the av- 
erage memory access time is considerably reduced Q . 

Since fast memory is expensive, the cache usually holds 
far fewer pages than the main memory. When a program 
requires a certain page that is not already in the cache, 
the computer has to fetch it from the main memory. But 
where in the cache should this new page be placed? The 
placements in the cache of these incoming pages must be 
organized so that (i) very little time is spent in locating 
the page where the word is stored in the cache and (ii) 
the cache area is maximally utilized, i.e., ideally pages 
should not be sent back to the main memory, when there 
is space left in the cache. 

To fix our notations, we assume that the main mem- 
ory has M = 2™ pages and the cache has N — 2" pages, 
where m ^ n. We use m-bit and n-bit 0-1 strings as ad- 
dresses for pages in the main memory and the cache re- 
spectively. The page of the main memory corresponding 
to the address a G {0, 1}'" will be denoted by Pa. Simi- 
larly Qh will denote the page in the cache corresponding 
to the address b e {0, 1}". When an access to page Pa is 
made in the main memory, it has to be brought to a cer- 
tain page Qb in the cache. So the question is what is the 
best strategy for choosing Qh (i.e., cache organization) 
such that the cache area is maximally utilized. 

There are two extreme strategies for cache organiza- 
tion. Strategy 1, usually called direct mapping strategy, 
assigns a fixed location Qi, in the cache for each page Pa 
of the main memory where b is the first n bits of a. That 
is, each time Pa is fetched into the cache, it will be placed 
in page Qh] if there is already a page of the main memory 
residing at Qb, then that page will be sent back to the 
main memory (the main memory will be updated) and 
Pa will replace it in the cache. In this strategy when an 
access to a page Pa is made, we know exactly where to 
find it in the cache, namely Qt. Thus it performs well on 



point (i) as no time is wasted in searching. However it 
performs poorly on point (ii) as a page can be sent out 
even when most of the cache is unused. 

The second strategy, Strategy 2, usually called associa- 
tive mapping strategy, allows Pa to reside anywhere in the 
cache; if all the pages in the cache are already occupied, 
one of them, chosen according to some rule (e.g. least 
recently used), will be sent back to the main memory to 
make space for the Pa- In this strategy, when a page 
Pa is accessed, determining if it is already present in the 
cache can be expensive, both from the point of view of 
time taken and the hardware needed to implement the 
search. It has, however, its advantage in the utilization 
of cache area, since no page in the cache is sent out unless 
the cache is full. 

Thus Strategy 1, though preferable from the point of 
view of design, is likely to be inferior to Strategy 2 in 
utilization of the memory available in the cache. In or- 
der to improve the performance of caches, several other 
strategies, which try to to combine the advantages of 
Strategies 1 and 2, have been proposed. In this letter, 
we will primarily be concerned with three such strategies 
A, B and C mentioned below. These strategies perform 
considerably better than Strategy 2 on point (i), and are 
found to be better than Strategy 1 on point (ii). Our 
goal in this Letter would be to compare quantitatively 
the performances of these various strategies. 
Strategy A: This is known as hash-rehash Strategy [Q. 
In this strategy, each page of the main memory is allowed 
to reside in two locations Qb^ and Qb2 in the cache that 
are determined as follows: bi is the string of the first 
n bits of a and 62 is the string of the next n bits of a. 
When a page Pa from the main memory is brought to 
the cache, it is first put in Qb^ provided Qb^ is empty; 
if Qbi is occupied we place Pa in Qb2- (If there is some 
page Pa' residing at Qb^ , then it is replaced by Pa and is 
sent back to the main memory.) 

Strategy B: This is known as two-way skewed- 
associative strategy [^|j^J^,^. In this case, we divide the 
cache into two banks Q and Q' , each with capacity 2"^^. 
The pages of the two banks are denoted by Qb and QJ^ 
respectively where b G {0, With each page Pa are 
associated two pages Qb^ and , one from each bank. 
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When a page Pa is brought to the cache, we first try to 
store Pa in the first bank at location Qh^ , and if that 
fails, we store it in the second bank at location Q'^^ . 

Strategy C: This is known as two-way set-associative 
strategy Here the cache is again divided into two 

banks. The only difference is that we associate pages Qti 
and Q'f,^ with page Pa- That is, if Qbi is already taken, we 
try to store the page in the second bank but at location 
Q'l^^ (and not Q'^^ as in Strategy B). 

Clearly, Strategies A, B and C are variants of Strategy 
1. Recently, based on data obtained from simulations, 
Seznec and Bodin have strongly advocated the 

use of skewed-associative caches (Strategy B). However 
no attempt appears to have been made to compare the 
performance of this strategy with older strategies within 
a theoretical framework. In this letter, we take the first 
steps in this direction. In particular, we calculate ana- 
lytically a quantity called performance factor a (defined 
below) for various strategies mentioned above. 

The cache has place to store N — 2" pages. If N 
pages are accessed at random, a perfect strategy would 
accomodate them all in the cache, without sending any 
page back to the main memory. But for most practi- 
cal strategies some pages will be sent back to the main 
memory. We define the performance factor a of a strat- 
egy to be the expected fraction of the randomly ac- 
cessed N pages (in the N ^ (x limit) that get acco- 
modated in the cache. A perfect strategy has a = 1. 
For all other strategies, a < 1. We show below that 
ai = 1 — e^^ = 0.63212 . . . for Strategy 1 but it increases 
considerably to aA (e^ - l)/{e'^ + 1) = 0.76159..., 
as = 1 - 2^2(1 + e^-^'^) = 0.77167... and ac = 
1 - 2e"^ = 0.72932 . . . for variants A, B and C respec- 
tively. 

It turns out that ai and ac can be computed quite eas- 
ily using elementary probability arguments. Let us first 
compute ai for Strategy 1 in its original form. Suppose 
a page Pa of the main memory is chosen at random. The 
first n bits of a will be uniformly distributed in {0, 1}". 
Thus, each access corresponds to the choice of one of 
the iV = 2" pages of the cache, and we want to know 
how many distinct pages of the cache are expected to be 
chosen after N random pages of the main memory are 
accessed. In other words, we have N bins and N balls 
and each ball is thrown at random into one of the bins; 
if the bin is already occupied then the new ball replaces 
the existing ball. (Clearly, it makes no difference to the 
calculations if we think that the old ball stays and it is 
the new ball that is discarded.) What is the expected 
number of bins that will be occupied after all N balls 
have been tried? The probability that any fixed bin re- 
mains empty after k balls have been tried is (1 — \/N)^. 
Therefore, the expected fraction a;i(fc) of the occupied 
bins after k trials is given by. 



xl(fc)^l-(l--)^ (1) 

The performance factor is then given by, 

ai = lim xi{n) = l~- ^ 0.63212 .... (2) 

n — >oo c 

Similar arguments can be used for Strategy C also. 
Here there are two banks and each bank has N' — N/2 
bins. For every bin in the first bank, there is an associ- 
ated bin in the second bank. A bin from the first bank 
is chosen at random and a ball is thrown into that bin. 
If the bin was already occupied then the ball is thrown 
into the associated bin in the second bank. If the sec- 
ond bin was also occupied, then the ball is rejected. Let 
Xi{k) and X2{k) denote the fraction of occupied bins in 
the first and the second bank respectively after k tri- 
als. Then clearly xi(fc) — 1 — (1 — 1/N'Y as given by 
Eq. (|l|). We also note that the probability that a bin 
in the second bank remains unoccupied given that its 
partner in the first bank is occupied is simply given by 
xi{k) - X2{k) = ^{1 - l/N')''-^. This is because out 
of k trials, only one (which can be 1-st, 2-nd,. . ., or the 
fc-th trial) should be succesful in choosing the given bin 
in the first bank and the others should be unsuccessful. 
So the performance factor ac is given by, 

ac= lim ^^(^)+^^(^) =l-2e-^ = 0.72932.... (3) 

It turns out that such elementary arguments, however, 
do not give us the expressions for the fraction of occupied 
bins for Strategies A and B and one has to unfortuantely 
carry out more detailed computations for those two cases 
which we outline below. 

We first consider Strategy A. In this case there are two 
fixed locations and Qb^ in the cache corresponding 
to a page Pa in the main memory. If a is an m bit string 
then bi consists of the first n bits of a and 62 the next n 
bits of a (assuming m 3> 2n). Suppose that an address 
a is accessed randomly by the computer. As a varies 
uniformly over {0, 1}™, the two corresponding strings 61 
and 62 also vary uniformly over {0, 1}" and their distri- 
butions are independent. Thus, in the language of balls 
and bins, we have the following process. There are N 
balls and N bins. The balls are placed one after another 
using the following strategy. For a given ball, a bin is 
chosen at random and the ball is attempted to be placed 
there. If the bin is empty the ball occupies the bin; if the 
bin is occupied, another bin is picked at random (note 62 
is independent of 61 unlike in Strategy C). If this second 
bin is empty, the ball occupies it. However, if this bin is 
also occupied, then the ball is discarded. 

We define PA{r,k) to be the probability that r bins 
are occupied after "time" k, i.e., after k trials. Note that 
in this case each ball is tried at the most twice. The 
evolution equation for PA{r, k) is then given by. 
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X PA{r-l,k). 



(4) 



with the "boundary" condition, PA(0,fc) — 5kfl and the 
"initial" condition, PA(r, 0) = 5^,0- This equation can 
be written in a compact matrix form, [PA{r,k + 1)] — 
W^[P^(r, k)] where W is the (n + 1) x (n + 1) evolution 
matrix whose elements can be easily read off Eq. || This 
equation can be solved using the standard techniques of 
statistical physics which we outline below. The general 
solution can be written as 



(5) 



where [QaC?")] is the right eigenvector (with eigenvalue A) 
of the matrix W and a\s are constants to be determined 
from the initial condition. The eigenvalues of W are sim- 
ply, A = l^/N^, where / = 0, 1, 2, . . . , TV. The l-th right 
eigenvector satisfies the equation. 



^QK'^) + [1 - ^^^^]Qi{r - 1) = (6) 
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for all r > 1. The generating function Qi{z) 



Qi{r)z^ satisfies the differential equation, 

z'{l - z)^ + z{l - z)^ ~ {P - N^z)Qi = 0. (7) 
az az 

With a little algebra, it is not difficult to show that the 
well behaved solution of this differential equation is given 
by, Qi{z) = z'(f - z)P^'^';_i(2z - 1) for / < and 
Qn{z) — z^ where P^'^{x)'s are Jacobi polynomials de- 
fined as, 
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pa,b ^ J_ sr^ 

m=0 



x~l) (x + l) 

m / \n — m ' ^ 



The final solution for the generating function, Pa{z, k) = 



J2o P^ir, k)z'' is given by. 



Putting A; = A'' in Eq. ^ and taking N 
get the performance factor. 
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= 0.76159. 



oo limit, we 



(10) 



Next, we consider Strategy B. In this case, the cache 
area is divided equally into two banks. Corresponding to 
every page Pa in the main memory, there are two fixed 
locations Qh^ in the first compartment and Qb^ in the sec- 
ond. Again hi and 62 are uniformly and independently 
distributed over {0, 1}"^^. The operation on the cache 
then corresponds to the following problem with balls and 
bins. There arc A^ ^ 2" bins, half of which, A^' = 2"-\ 
belong to the first compartment and the remaining A'^' 
belong to the second. There are A^ balls and each is 
placed in the bins according to the following rules. One 
of the first A^' bins is chosen at random. If the bin is 
occupied one of the A^' bins in the second half is chosen. 
If even this bin is occupied the ball is discarded. 

Let Pb {ri ,r2,k) denote the probability that after 
"time' fc, the first compartment has ri occupied bins and 
the second r2 occupied bins. Now the second compart- 
ment is tried only if the first compartment fails to acco- 
modate a given ball. If there are ri occupied balls in the 
first compartment at time fc, clearly the scond compart- 
ment has been tried only {k — ri) times, out of which 
r2 attempts have been successful. Noting that for each 
compartment separately, the local strategy is exactly as 
in Strategy 1 defined earlier, it is easy to see that. 



PB{ri,r2,k) = Pi{n,k)Pi{r2,k-n) 



(11^ 



where Pi(r,k) is the probability of having r occupied 
balls in the first compartment after k trials and is the 
same as in Strategy 1. Pi(r, k) can be computed exactly 
by following the same steps as used for Strategy A. Us- 
ing this exact result in Eq. ^ and after some algebra, we 
finally find the expected fraction of occupied bins (taking 
into account both the banks) after k trials. 
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pAiz, k) = z^+Y, aiz\l - z)PljZ_,i2z - 1)(-, 
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1=0 



where the coefficients a/ 's are determined from the initial 
condition Pa{z,0) = 1. After some amount of algebra, 
we get, 



- (-1) 



(8) 



n — m 

for Z = 0, 1, 2, . . . A^— 1. Thus the expected fraction of oc- 

1 dPA(z.k) I 

■W dz ' 1^-1 IS given 



cupied bins after time fc, xa{}^) ^ J_£^M£iMl 



by, 



XA 



(fc) = l + 2^(-l)'^ 



•N' 



(9) 



XB{k) = S{N',k) 
where S{N', k) is the sum. 



(12) 



5(«',.).g(-ir-'(«')^(i^)' 

Putting k = N and taking the limit A^ — > 00, we finally 
get the performance factor as for Strategy B, 



as = 1-^(1 + 61^"="') = 0.77167. 



(13) 



The method illustrated above can not only calculate 
the final performance factor but it also gives exactly the 
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full probability distribution of the number of occupied 
bins at any arbitrary "time" k and for any cache size 
N. It turns out that there is an easier method (to be 
described below) to compute just the performance factor 
and is valid only in the iV — > oo limit. It not only cor- 
rectly reproduces the exact results for a's as found above 
but can further be used to compute the performance fac- 
tors for more general and complicated strategies. 

Consider, for example, a generalized version of Strat- 
egy A where instead of 2 fixed locations per page in the 
cache, one now has p fixed locations. In the "ball and 
bin" language, each ball is tried at most p times before 
being discarded. Let r(/c) be the random variable that 
denotes the number of occupied bins after discrete "time" 
k. Clearly, the increment Ar(fc) = r(fc + 1) — r{k) is also 
a random variable that takes the value with probability 
{r/N)P (when all p trials for a given ball are unsuccessful) 
and 1 with probability 1 — {r/N)P. Taking expectations, 
we get {r{k + 1)) - (r(fc)) = 1 - {{r/N)P). We then de- 
fine X = r/N and t ^ k/N and note that they become 
continuous in the N ^ oo limit and {x) evolves as 



d{x) 
dt 



= 1 



Using the method of bounded differences 
shown that for each t ^ [0,1] and e > 



(14) 



it can be 



Pr[|a;(<) - {x{t))\ > e] < 2 exp{-2e^ N / p) . 

From this, it follows that \{xP) - {xf\ < 0{N-i). Thus, 
one can neglect fluctuations in the N oo limit. Us- 
ing this in Eq. |l^, one gets a closed equation for {x){t) 
which can then be integrated to give the performance fac- 
tor, ap = {x){t = 1). For general p, we get ap as solution 
of the equation. 



dy 



1-yP 



1. 



(15) 



For example, for p — I and p = 2, we reproduce respec- 
tively, ai = 1 — and aA = (e^ — l)/(e^ + 1) from Eq. 
|l5| . For large p, we find from Eq. |l^. 
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P 



^p- 



(16) 



Thus as p increases, the deviation from perfect behaviour 



(1 



decays as a power law 1/p. 



In a similar fashion, one can consider a generalized ver- 
sion of Strategy B where instead of 2 equal compartments 
in the cache, one now considers p equal compartments. 
In this case, once again the performance factor for gen- 
eral p can be easily computed using the continuous time 
method. Apart from reproducing the result for p = 2, we 
find that for large p, 
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(e - l)p 



(17) 



Another generalization of Strategy B would be to con- 
sider 2 banks only but of unequal sizes N and 7A'^ respec- 
tively. Then a similar calculation shows that the perfor- 
mance factor, which is now a function of 7, is given by, 

p-(l+7) 



as(7) = 1 - 
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^-cxp [-(l+7)/7] 



(18) 



1+7 e(l-|-7) 

This function has a single maximum at 7 = 0.68932 . . . 
where asimax) — 0.775862..., clearly better than 
<^b(1) — 0.77167 . . ., the result for two equal banks. 

In summary, we have studied analytically the perfor- 
mance of several strategies for cache utilization. We have 
derived exact values for cache utilization factors for these 
strategies under the assumption that pages in the main 
memory are accessed at random. 

Cache utilization is just one of several issues in the de- 
sign of caches. Indeed, it has been reported |jl|,|| that 
two-way set associative caches require smaller execution 
times than hash-rehash caches, although, our analysis in- 
dicates that the latter utilize the cache better. 

To get performance values that will be useful to design- 
ers of cache, one should take into account the amount of 
time spent in decoding addresses, updating tables and 
the cost of hardware. Also, instead of assuming that 
the pages of the main memory are accessed at random, 
a better understanding of the access patterns in typical 
applications might be needed. 

We are grateful to Abhiram Ranade for bringing the 
question of analysis of skewed associative caches to our 
attention; in particular, the idea of using unequal banks 
in this strategy is due to him. Wc also thank D. Dhar 
for useful discussions. 
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